End-to-End Evaluation of Machine Interpretation Systems: A Graphical Evaluation Tool
نویسندگان
چکیده
VERBMOBIL as a long-term project of the Federal Ministry of Education, Science, Research and Technology aims at developing a mobile translation system for spontaneous speech. The source-language input consists of human speech (English, German or Japanese), the translation (bidirectional English-German and Japanese-German) and target-language output is effected by the VERBMOBIL system. As to the innovative character of the project new methods for end-to-end evaluation had to be developed by a subproject which has been established especially for this purpose. In this paper we present criteria for the evaluation of speech-tospeech translation systems and a tool for judging the translation quality which is called Graphical Evaluation Tool (GET). 1 This work was funded by the Federal Ministry of Education, Science, Research and Technology (BMBF) in the framework of the VERBMOBIL Project under Grant 01 IV 101 A/O and in the framework of the SFB 538 Mehrsprachigkeit (Collaborative Research Center No. 538 Multilingualism) by the Deutsche Forschungsgemeinschaft (DFG). The responsibility for the contents of this study lies with the authors. 2 To simplify the presentation of this paper, we only refer to the language pair German-English. Introduction The performance of evaluation very often is driven by the characteristics of the system that has to be judged (Andenfilger, 1994). As to the Verbmobil project the evaluation should meet three aspects: • the needs of the developers, • the needs of the user, • the constraints on the evaluation of translation quality in general. In our concept and performance of evaluation we tried to combine these three aspects but one should keep in mind that at least the constraints on translation quality in general were meant to describe human translation with all its varieties and specific stylistic features. As to the special case of machine interpretation still only texts from limited domains can be transferred. So in our view it seems to be legitimate to simplify some of the procedures that are applied to the evaluation of human translation. An evaluation method based on any well known standard (EAGLES, 1995; Spark Jones and Galliers, 1996; Manzi, 1996) could not have integrated the three cited aspects, as traditional evaluation methods are intended for comparative evaluations more than for the investigation of a system during its development; therefore, to meet the requirements we had, we developed an integrated methodology and a tool for speech-to-speech quality evaluation which also allows easy access to the data.
منابع مشابه
Performance Evaluation of Parallel Programs on the Data Diffusion Machine
A tool set for the monitoringand performance evaluation of parallel programs has been developed for the Data Diffusion Machine—a virtual shared memory architecture. The tool set has a layered structure, allowing the user to observe the machine at various levels of detail. The tools are built on top of a software emulation of the DDM. This emulator provides realistic timings because certain part...
متن کاملA Graphical Pronoun Analysis Tool for the PROTEST Pronoun Evaluation Test Suite
We present a graphical pronoun analysis tool and a set of guidelines for manual evaluation to be used with the PROTEST pronoun test suite for machine translation (MT). The tool provides a means for researchers to evaluate the performance of their MT systems and browse individual pronoun translations. MT systems may be evaluated automatically by comparing the translation of the test suite pronou...
متن کاملConstructing a clinical curriculum evaluation tool based on community orientation strategy (A guide for application)
Introduction: SPICES is an approach to assist curriculum planners while planning, reviewing or revising a curriculum. However, in the literature, there are few published papers describing the application of SPICES criteria for curriculum evaluation. The goal of this study is development of curriculum evaluation tool based on community-based strategy in SPICES model. Method: This developmental ...
متن کاملThe Correlation of Machine Translation Evaluation Metrics with Human Judgement on Persian Language
Machine Translation Evaluation Metrics (MTEMs) are the central core of Machine Translation (MT) engines as they are developed based on frequent evaluation. Although MTEMs are widespread today, their validity and quality for many languages is still under question. The aim of this research study was to examine the validity and assess the quality of MTEMs from Lexical Similarity set on machine tra...
متن کاملEvaluation of Machine Translation and its Evaluation
Evaluation of MT evaluation measures is limited by inconsistent human judgment data. Nonetheless, machine translation can be evaluated using the well-known measures precision, recall, and their average, the F-measure. The unigrambased F-measure has significantly higher correlation with human judgments than recently proposed alternatives. More importantly, this standard measure has an intuitive ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2000